Medical Image Analysis
○ Elsevier BV
Preprints posted in the last 30 days, ranked by how well they match Medical Image Analysis's content profile, based on 33 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit.
Xu, R.; Jiang, S.; Zhai, Y.; Chen, Y.
Show abstract
Background: Segmentation of the left ventricular myocardium, left ventricular cavity, and right ventricular cavity on short-axis cine cardiac magnetic resonance (CMR) images is essential for quantifying cardiac structure and function. However, existing automated segmentation tools are limited by small training datasets, narrow disease coverage, restrictive input format requirements, and the absence of anatomical plausibility constraints, hindering their clinical adoption. Methods: We constructed the largest annotated CMR short-axis segmentation dataset to date, comprising 1,555 subjects from 12 centers with five cardiac disease types and full cardiac cycle annotations totaling 319,175 labeled images. A MedNeXt-L model was trained using a 2D slice-by-slice strategy with full field-of-view input, eliminating dependencies on 3D volumes, temporal sequences, or region-of-interest(ROI) localization. A deterministic three-step post-processing pipeline was designed to enforce anatomical priors: connected component constraint, containment relationship constraint, and gap-filling constraint. The model was validated on an internal test set (310 subjects) and three independent public external datasets (ACDC, M&Ms1, M and Ms2; 855 subjects from 6 additional centers across 3 countries), spanning 15 cardiac disease categories-10 of which were never encountered during training. Results: The model achieved mean Dice similarity coefficients (DSC) of 0.913 {+/-} 0.037 and 0.911 {+/-} 0.040 on internal and external test sets, respectively, with a cross-domain performance gap of only 0.002. Post-processing eliminated all containment violations (7.5% [->] 0%) and gap errors (1.8% [->] 0%) while reducing fragment rates by 85.5% (9.0% [->] 1.3%). Zero-shot generalization to 10 unseen disease categories yielded DSC values ranging from 0.899 to 0.921. Automated clinical functional parameters demonstrated excellent agreement with manual measurements for left ventricular indices and right ventricular volumes (intraclass correlation coefficients [≥] 0.977). Conclusions: CorSeg-CineSAX provides a robust, open-source framework for fully automatic CMR short-axis segmentation across diverse clinical scenarios. All source code and pre-trained weights are publicly available at https://github.com/RunhaoXu2003/CorSeg.
Kritopoulos, G.; Neofotistos, G.; Barmparis, G. D.; Tsironis, G. P.
Show abstract
Class imbalance in clinical electrocardiogram (ECG) datasets limits the diagnostic sensitivity of automated arrhythmia classifiers, particularly for rare but clinically significant beat types. We propose a three-stage hybrid generative pipeline that combines a spectral-guided conditional Variational Autoencoder (cVAE), a class-conditional latent Denoising Diffusion Probabilistic Model (DDPM), and a Quantum Latent Refinement (QLR) module built on parameterized quantum circuits to augment minority arrhythmia classes in the MIT-BIH Arrhythmia Database. The QLR module applies a bounded residual correction guided by Maximum Mean Discrepancy minimization to align synthetic latent distributions with real class-specific latent banks. A lightweight 1D MobileNetV2 classifier evaluated over five independent random seeds and four augmentation ratios serves as the downstream benchmark. Our findings establish latent diffusion augmentation as an effective strategy for imbalanced ECG classification and motivate further investigation of quantum-classical hybrid methods in cardiac diagnostics.
Qiu, P.; An, Z.; Ha, S.; Kumar, S.; Yu, X.; Sotiras, A.
Show abstract
Multimodal medical image analysis exploits complementary information from multiple data sources (e.g., multi contrast Magnetic Resonance Imaging (MRI), Diffusion Tensor Imaging (DTI), and Positron Emission Tomography (PET)) to enhance diagnostic accuracy and support clinical decision making. Central to this process is the learning of robust representations that capture both modality invariant and modality specific features, which can then be leveraged for downstream tasks such as MRI segmentation and normative modeling of population level variation and individual deviations. However, learning robust and generalizable representations becomes particularly challenging in the presence of missing modalities and heterogeneous data distributions. Most existing methods address this challenge primarily from a statistical perspective, yet they lack a theoretical understanding of the underlying geometric behavior such as how probability mass is allocated across modalities. In this paper, we introduce a generalized geometric perspective for multimodal representation learning grounded in the concept of barycenters, which unifies a broad class of existing methods under a common theoretical perspective. Building on this barycentric formulation, we propose a novel approach that leverages generalized Wasserstein barycenters with hierarchical modality specific priors to better preserve the geometry of unimodal distributions and enhance representation quality. We evaluated our framework on two key multimodal tasks brain tumor MRI segmentation and normative modeling demonstrating consistent improvements over a variety of multimodal approaches. Our results highlight the potential of scalable, theoretically grounded approaches to advance robust and generalizable representation learning in medical imaging applications.
Taherkhani, M.; Pizzolato, M.; Morup, M.; Dyrby, T. B.
Show abstract
Diffusion-weighted magnetic resonance imaging (dMRI) is used to study white matter microstructure and to delineate pathways by estimating fiber orientation distributions (FODs). Symmetric FODs represent the conventional model assuming antipodal symmetry in water diffusion. However, in complex regions with bending, branching or fanning fibers, this assumption is not guaranteed. To better capture such underlying fibers geometries, asymmetric FODs (A-FODs), derived from neighboring FODs, have been introduced. Here, we propose an Encoder-based Curvature-Aware Regularization (EnCAR) method for estimating A-FODs. Incorporating curvature features into the regularization weight applied to neighboring voxels improves reconstruction of A-FODs. A self-supervised Transformer network, combined with a Spherical Harmonics Semantic Encoder, learns region-specific regularization parameters from this local neighborhood to capture the diversity of fiber geometries across the brain. The EnCAR method was verified on the DiSCo challenge phantom, and applied to in vivo multi-shell Human data. The model estimated sharp, high-angular-resolution A-FODs that were well aligned with local fiber pathway. Compared with established FOD and A-FOD methods, it performed on par in regions dominated by symmetric FODs and outperformed them in complex asymmetric regions. Quantitative evaluation using the Asymmetry Index (ASI) and Model Discrepancy Index (MDI) confirmed improved consistency with the underlying diffusion signals. By ensuring smooth directional transitions, this work enhances the visibility of continuous fiber segments.
Barkhau, C. B. C.; Mahjoory, K.; Brenner, M.; Weber, E.; Leenings, R.; Pellengahr, C.; Winter, N. R.; Konowski, M.; Straeten, T.; Meinert, S.; Leehr, E. J.; Flinkenfluegel, K.; Borgers, T.; Grotegerd, D.; Meinert, H.; Hubbert, J.; Jurishka, C.; Krieger, J.; Ringels, W.; Stein, F.; Thomas-Odenthal, F.; Usemann, P.; Teutenberg, L.; Nenadic, I.; Straube, B.; Alexander, N.; Jansen, A.; Jamalabadi, H.; Kircher, T.; Junghoefer, M.; Dannlowski, U.; Hahn, T.
Show abstract
Modeling individual brain dynamics from resting-state fMRI (rs-fMRI) remains challenging due to substantial inter-subject variability, measurement noise, and limited data length per subject. Here, we systematically evaluate a hierarchical dynamical systems framework based on shallow piecewise-linear recurrent neural networks (shPLRNNs) for individualized modeling of rs-fMRI data, with a particular focus on reproducing subject-specific functional connectivity (FC). We applied the framework to 1,423 rs-fMRI samples from healthy participants of the Marburg-Munster Affective Disorders Cohort Study (MACS). Simulated rs-fMRI data robustly reproduced empirical FC patterns, with comparable reconstruction accuracy on training and independent validation sets. Generalization to unseen individuals was heterogeneous and strongly depended on how typical a subjects connectivity pattern was relative to the training cohort, with template similarity explaining 37% of variance in reconstruction accuracy. Learned subject-specific parameters exhibited significant test-retest stability and higher within-subject than between-subject similarity on longitudinal data from two different timepoints, supporting their interpretation as individualized dynamical markers. Associations between individual parameters and demographic or cognitive variables were statistically significant but modest in effect size, and predictive performance remained below that obtained using empirical rs-fMRI features directly. Together, these results demonstrate that hierarchical shPLRNNs can extract meaningful and stable individual-specific dynamical structure from rs-fMRI data, while highlighting current limitations in capturing fine-grained individual differences. The findings delineate key trade-offs between model expressivity, generalization and subject specificity, and point to directions for future methodological refinement in individualized brain modeling.
Marques dos Santos, J. D.; Ramos, M. B.; Reis, L. P.; Marques dos Santos, J. P.; Direito, B.
Show abstract
The application of artificial intelligence (AI) to functional magnetic resonance imaging (fMRI) has gained increasing attention due to its ability to model complex, high-dimensional brain data and capture nonlinear patterns of neural activity. However, deep learning architectures, such as Graph Neural Networks (GNNs), typically require large sample sizes to achieve stable convergence, limiting their applicability in neuroimaging contexts where data are often scarce. This challenge highlights the need for compact, data-efficient models that maintain predictive performance and interpretability. Shallow neural networks (SNNs) have demonstrated robustness in low-sample settings but commonly rely on region-level features that treat brain areas independently, overlooking the brains intrinsically network-based organization. To address this limitation, we propose a structurally constrained message-passing framework that integrates diffusion tensor imaging (DTI)-derived structural connectivity with region-level fMRI signals within a shallow architecture. This approach enables network-level modeling while preserving the stability and data efficiency of SNNs. The method is evaluated on 30 subjects performing a Theory of Mind (ToM) task from the Human Connectome Project Young Adult dataset. A baseline SNN achieved global accuracies of 88.2% (fully connected), 80.0% (pruned), and 84.7% (retrained), while the proposed model achieved 87.1%, 77.6%, and 84.7%, respectively. Although structural constraints led to a more pronounced performance decrease after pruning, retraining restored accuracy to baseline levels, demonstrating that biological constraints can be incorporated without compromising predictive validity. Model interpretability was assessed using SHAP (Shapley Additive Explanations). While the baseline model primarily identified isolated regions as key contributors, the proposed framework revealed distributed, structurally coherent networks as the main drivers of classification. These networks showed correspondence with established ToM regions, including the temporo-parietal junction, superior temporal sulcus, and inferior frontal gyrus. Importantly, the findings suggest that groups of moderately informative regions can collectively form highly relevant subnetworks. Overall, the proposed framework achieves competitive performance in a limited dataset while incorporating graph-inspired message passing into a shallow architecture. Its explainability provides insight into how structurally constrained networks support stimulus-driven responses in ToM and demonstrates potential for investigating network dysfunction in disorders such as Alzheimers disease, ADHD, autism spectrum disorder, bipolar disorder, mild cognitive impairment, and schizophrenia.
Wei, Y.; Smith, S. M.; Gohil, C.; Huang, R.; Griffin, B.; Cho, S.; Adaszewski, S.; Fraessle, S.; Woolrich, M. W.; Farahibozorg, S.-R.
Show abstract
Dynamic functional connectivity (dFC) models have become increasingly popular over the past decade for characterising time-varying interactions between brain regions. However, assessing and comparing dFC models remains challenging. Here, we introduce bi-cross-validation as a general framework for evaluating dFC models and selecting key hyperparameters, such as the number of states. By jointly partitioning the data across subjects and brain regions, bi-cross-validation enables out-of-sample evaluation without re-estimating latent states on the same data used for testing, thereby avoiding circularity. Using simulated data with known ground-truth dynamics, we show that bi-cross-validation favours models that accurately capture the underlying state structure. Applying the framework to real resting-state fMRI data, we demonstrate that bi-cross-validation naturally balances goodness-of-fit against model complexity, with performance improving and then declining as model complexity increases. Finally, we use bi-cross-validation to directly compare static and dynamic FC models, showing that dynamic models underperform static models at low spatial dimensionality, but outperform static models at sufficiently high dimensionality. Together, these results establish bi-cross-validation as a principled tool for dFC model selection, evaluation, and comparison.
Avaria-Saldias, R. H.; Ortiz, D.; Palma-Espinosa, J.; Cancino, A.; Cox, P.; Salas, R.; Chabert, S.
Show abstract
Accurate characterisation of the haemodynamic response function (HRF) is central to interpreting blood-oxygen-level-dependent (BOLD) signals in functional magnetic resonance imaging, yet standard estimation approaches remain centred around phenomenological formulations lacking biophysical grounding. We present a physics-informed neural network (PINN) framework that bridges these paradigms by embedding the Balloon-Windkessel model directly into the training objective of a multi-headed Neural Network. Our aproach simultaneously estimates probable latent neurovascular state variables such as cerebral blood inflow, metabolic rate of oxygen consumption, blood volume, and deoxyhaemoglobin content, through an indirect optimisation scheme in which the predicted BOLD signal is obtained via convolution of the estimated HRF with experimental stimuli. Training is governed by a composite loss, balancing differential-equation residuals, physiological initial conditions and data fidelity. In simulations with temporal signal-to-noise ratios representative of clinical acquisitions, the framework recovered ground-truth state variables with coefficients of determination exceeding 0.99 and mean squared errors below 10-3, at a physics-to-data weighting of 0.40:0.60. Application to 1.5 T block-design fMRI data from an ischaemic stroke patient yielded physiologically plausible, subject-specific HRF estimates, establishing feasibility of single-subject, physics-constrained HRF inference without reliance on fixed gamma basis assumptions.To our knowledge, this constitutes the first deployment of a single PINN incorporating the full Balloon-Windkessel model within an indirect training objective, reconstructing full BOLD observations, positioning PINN-based haemodynamic modelling as a principled and personalised route towards more interpretable and patient-specific fMRI biomarkers.
Hakata, Y.; Oikawa, M.; Fujisawa, S.
Show abstract
Background. Federated learning (FL) enables collaborative model training across institutions without sharing patient-level data. However, standard FL algorithms such as FedAvg degrade under non-independently and non-identically distributed (non-IID) data, a prevalent condition when patient demographics, scanner hardware, and disease prevalence differ across hospital sites. Objective. We propose iPS-MFFL (Individualized Per-Site Meta-Federated Feature Learning), a federated framework with a hierarchical local-model architecture that addresses non-IID heterogeneity through (1) a shared feature extractor, (2) multiple weak-learner classification heads that can be trained with heterogeneous training objectives to promote complementary decision boundaries, (3) independent per-learner server aggregation so that each weak learner's parameters are averaged only with its counterparts at other clients, and (4) a lightweight meta-model, itself federated, that adaptively stacks the weak-learner outputs. Methods. We evaluate on the Brain Tumor MRI Classification dataset (7,200 images; 4 classes: glioma, meningioma, pituitary tumor, no tumor) partitioned across K = 5 simulated hospital sites using Dirichlet non-IID sampling (alpha = 0.3). Four baselines are compared: Local-only training, FedAvg, FedProx, and Freeze-FT. All experiments are repeated over three random seeds (13, 42, 2025) and evaluated using paired t-tests, Cohen's d effect sizes, and post-hoc power analysis.
Hakata, Y.; Oikawa, M.; Fujisawa, S.
Show abstract
Background. Adult diffuse glioma is a representative class of primary brain tumors for which accurate MRI-based tumor segmentation is indispensable for treatment planning. Conventional automated segmentation methods have relied primarily on image information and spatial prompts, and auxiliary clinical information that is routinely acquired in clinical practice has not been sufficiently exploited as an input. Objective. Building on a dual-prompt-driven Segment Anything Model (SAM) extension framework that fuses visual and language reference prompts, we propose a method that integrates patient demographics, unsupervised molecular cluster variables derived from TCGA high-throughput profiling, and histopathological parameters as learnable prompt embeddings, and we evaluate its effect on the accuracy of lower-grade glioma (LGG) MRI segmentation. Methods. An auxiliary prompt encoder converts clinical metadata into high-dimensional embeddings that are fused with the prompt representations of Segment Anything Model (SAM) ViT-B through a cross-attention fusion mechanism. The TCGA-LGG MRI Segmentation dataset (Kaggle release by Buda et al.; n = 110 patients; WHO grade II-III) was split at the patient level (train/val/test = 71/17/22) using three different random seeds, and the three slices with the largest tumor area were extracted from each patient. To avoid pseudo-replication arising from multiple slices per patient and repeated measurements across seeds, our primary analysis aggregated Dice and 95th-percentile Hausdorff distance (HD95) to the patient x seed unit (n = 66); secondary analyses at the unique-patient level (n = 22) and at the per-slice level (n = 198) are also reported. Pairwise comparisons used paired t-tests with Bonferroni correction (k = 3) and Wilcoxon signed-rank tests, and a permutation test (K = 30) served as an auxiliary check of effective use of the auxiliary information. Results. At the patient x seed level (n = 66), Proposed (full clinical) achieved a Dice gain of +0.287 over the zero-shot SAM ViT-B baseline (paired-t p = 4.2 x 10^-15, Cohen's d_z = +1.25, Bonferroni-corrected p << 0.001; Wilcoxon p = 2.0 x 10^-10), and HD95 improved from 218.2 to 64.6. Because zero-shot SAM is not designed for domain-specific medical segmentation, the large absolute HD95 gap largely reflects the expected domain gap rather than a competitive baseline. The additional contribution of the full clinical configuration over the demographics-only configuration was Dice = +0.023 (paired-t p = 0.057, Bonferroni-corrected p = 0.172), which did not reach statistical significance at the patient level and is reported as a directional trend. The permutation test (K = 30, seed 2025) yielded real-metadata Dice = 0.819 versus a shuffled-metadata mean of 0.773, giving an empirical p = 0.032 = 1/(K + 1), which is at the resolution limit of this test and should therefore be interpreted as preliminary evidence. Conclusions. Integrating auxiliary clinical information as multimodal prompts produced a large improvement over the zero-shot SAM baseline on this LGG cohort. More importantly, a robustness analysis showed that Proposed (full clinical) outperformed the trained Base (no auxiliary information) under all tested spatial-prompt conditions, including perfect centroid (+0.014), and that the advantage was most pronounced in the prompt-free regime (+0.231, p = 0.039), where the base model collapsed but the proposed model maintained meaningful segmentation by leveraging clinical metadata alone. The additional contribution of molecular and histopathological information beyond demographics was not statistically resolved at the patient level (+0.023, n.s.). Establishing clinical utility will require external validation on larger multi-center cohorts and direct comparisons with established segmentation methods. Keywords: brain tumor segmentation; Segment Anything Model (SAM); vision-language prompt-driven segmentation; auxiliary clinical prompts; multimodal learning; TCGA-LGG; deep learning
Bai, B.; Shih, T.-C.; Miyata, K.
Show abstract
Vision-language models (VLMs) provide a unified framework for multimodal reasoning, yet their representations are primarily learned from natural image-text corpora and often exhibit semantic misalignment when transferred to histopathology, particularly under data-limited diagnostic settings. To address this limitation, we propose HistoSB-Net, a semantic bridging network designed to adapt pre-trained VLMs to multimodal histopathological diagnosis while preserving their original semantic structure. HistoSB-Net introduces a constrained semantic bridging (CSB) module that operates within the self-attention projection space of both vision and text encoders. Instead of employing explicit cross-attention or full fine-tuning, CSB adaptively modulates pre-trained attention projections through a lightweight nonlinear semantic bottleneck, enabling structured cross-modal regulation with limited additional parameters. The framework supports both patch-level and whole-slide image (WSI)-level diagnosis within a unified architecture. Experiments on six pathology benchmarks, comprising two WSI-level and four patch-level datasets, demonstrate consistent improvements over zero-shot inference across 36 backbone-dataset combinations under limited supervision. Further analysis of prototype-based margin distributions and confusion matrices shows that these improvements are accompanied by enhanced intra-class compactness and increased inter-class separation in the embedding space. These results indicate that CSB provides an effective and computationally manageable strategy for adapting pre-trained VLMs to data-limited digital pathology tasks.
Roca, M.; Messuti, G.; Klepachevskyi, D.; Angiolelli, M.; Bonavita, S.; Trojsi, F.; Demuru, M.; Troisi Lopez, E.; Chevallier, S.; Yger, F.; Saudargiene, A.; Sorrentino, P.; Corsi, M.-C.
Show abstract
Neurodegenerative diseases such as Mild Cognitive Impairment (MCI), Multiple Sclerosis (MS), Parkinson s Disease (PD), and Amyotrophic Lateral Sclerosis (ALS) are becoming more prevalent. Each of these diseases, despite its specific pathophysiological mechanisms, leads to widespread reorganization of brain activity. However, the corresponding neurophysiological signatures of these changes have been elusive. As a consequence, to date, it is not possible to effectively distinguish these diseases from neurophysiological data alone. This work uses Magnetoencephalography (MEG) resting-state data, combined with interpretable machine learning techniques, to support differential diagnosis. We expand on previous work and design a Riemannian geometry-based classification pipeline. The pipeline is fed with typical connectivity metrics, such as covariance or correlation matrices. To maintain interpretability while reducing feature dimensionality, we introduce a classifier-independent feature selection procedure that uses effect sizes derived from the Kruskal-Wallis test. The ensemble classification pipeline, called REDDI, achieved a mean balanced accuracy of 0.81 (+/-0.04) across five folds, representing a 13% improvement over the state-of-the-art, while remaining clinically transparent. As such, our approach achieves reliable, interpretable, data-driven, operator-independent decision-support tools in Neurology.
Wang, S.; Ayubcha, C.; Hua, Y.; Beam, A.
Show abstract
Background: Developing generalizable neuroimaging models is often hindered by limited labeled data which has led to an increased interest in unsupervised inverse learning. Existing approaches often neglect geometric principles and struggle with diverse pathologies. We propose a symmetry-informed inverse learning foundation model to address these shortcomings for robust and efficient anomaly detection in brain MRI. Methods: Our framework employs a reconstruction-to-embedding pipeline, trained exclusively on healthy brain MRI slices. A 2D U-Net uses a novel, symmetry-aware masking strategy to reconstruct a disorder-free slice. Difference maps are embedded into a 1024-dimensional latent space via a Beta-VAE. Anomaly scoring is performed using Mahalanobis distance. We evaluated generalization by fine-tuning on external lesion datasets, BraTS Africa (SSA), and the ADNI-derived Alzheimer disease cohort (Alz). Results: On the source metastasis (Mets) dataset, the framework achieved high performance (AB1+MSE: 99.28% accuracy, 99.79% sensitivity). Generalization to the external lesion dataset (SSA) was robust, with the Symmetry ROC configuration achieving 91.93% accuracy. Transfer to the Alzheimer dataset (Alz) was more challenging, achieving a peak accuracy of 70.54% with a high false-positive rate, suggesting difficulty in separating subtle, diffuse changes. Conclusion: The symmetry-informed inverse learning framework establishes a robust foundation model for neuroimaging, showing strong performance for focal lesions and successful generalization under domain shift. Limitations in diffuse neurodegeneration underscore the necessity for richer representations and multimodal integration to improve future foundation models.
Mayala, S.; Mzurikwao, D.; Suluba, E.
Show abstract
Deep learning model classification on large datasets is often limited in countries with restricted computational resources. While transfer learning can offset these limitations, standard architectures often maintain a high memory footprint. This study introduces HybridNet-XR, a memory-efficient and computationally lightweight hybrid convolutional neural network (CNN) designed to bridge the domain gap in medical radiography using autonomous self-supervised learning protocols. The HybridNet-XR architecture integrates depthwise separable convolutions for parameter reduction, residual connections for gradient stability, and aggressive early downsampling to minimize the video RAM (VRAM) footprint. We evaluated several training paradigms, including teacher-free self-supervised learning (SSL-SimCLR), teacher-led knowledge distillation (KD), and domain-gap (DG) adaptation. Each variant was pre-trained on ImageNet-1k subsets and fine-tuned on the ChestX6 multi-class dataset. Model interpretability was validated through gradient-weighted class activation mapping (Grad-CAM). The performance frontier analysis identified the HybridNet-XR-150-PW (Pre-warmed) as the optimal configuration, achieving a 93.38% average accuracy and 99% AUC while utilizing only 814.80 MB of VRAM. Regarding class-wise accuracy, this variant significantly outperformed standard MobileNetV2 and teacher-led models in critical diagnostic categories, notably Covid-19 (97.98%) and Emphysema (96.80%). Grad-CAM visualizations confirmed that the teacher-free pre-warming phase allows the model to develop sharper, anatomically grounded focus on pathological landmarks compared to distilled models. Specialized pre-warming schedules offer a viable, computationally autonomous alternative to knowledge distillation for medical imaging. By eliminating the requirement for high-performance teacher models, HybridNet-XR provides a robust and trustworthy diagnostic foundation suitable for clinical deployment in resource-constrained environments. Author summaryTraditional deep learning models for medical imaging are often too large for the low-power computers available in many global health settings. We developed a new model to bridge this computational gap. We designed HybridNet-XR, a highly efficient AI architecture, and trained it using a "teacher-free" method that doesnt require a massive supercomputer. We found a specific version (H-XR150-PW) that provides high accuracy while using very little memory. Our results show that high-performance diagnostic AI can be deployed on standard, low-cost hardware. Furthermore, using visual heatmaps (Grad-CAM), we proved that the AI correctly identifies medical landmarks like lung opacities, ensuring it is safe and reliable for real-world clinical use.
Li, T.; Wang, X.; Cole, M.; Sun, Z.; Jiang, Z.; Qian, X.; Gao, S.; Luo, T.; Descoteaux, M.; Stein, J. L.; Wang, X.; Nichols, T. E.; Zhang, H.; Zhang, Z.; Zhu, H.
Show abstract
Large-scale population analyses of structural connectome organization remain challenging because of cross-subject alignment, pathway interpretability and computational burden. No widely adopted standard exists for systematic evaluation across processing methods. We developed connectome-based spatial statistics (CBSS), a scalable framework for anatomically aligned and functionally informed quantification of white-matter microstructure that yields atlas-defined pathways organized into 13 functional networks. Using data from 56,510 UK Biobank participants together with five independent lifespan cohorts, we evaluated the streamline-, voxel- and network-level measures in the aspects of reliability, heritability, structure-function coupling, cognitive and behavioral prediction, brain aging patterns and lifespan trajectories across cohorts. The systematic evaluation workflow compares population-level white-matter representations across methods, spatial scales, tasks and datasets. The results support CBSS as a common connectome reference for large-scale, cross-cohort diffusion MRI studies.
Wan, Z.; Hossain, J.; Fu, W.; Gollo, L.; Wu, K.
Show abstract
Brain age prediction from neuroimaging data provides critical insights into neurodevelopmental trajectories and neurodegenerative processes. However, effectively leveraging complementary structural and functional brain information for accurate prediction remains a major challenge. In this study, we propose an Attention-guided Multimodal brain Age prediction Network (AMAge-Net), a novel framework that integrates resting-state functional MRI (fMRI) and structural MRI (sMRI) to enhance brain age estimation. In AMAge-Net, functional features are captured from fMRI through a hierarchical Graph Attention Network, while structural features are learned from sMRI via a 3D DenseNet architecture. To enable effective cross-modal integration, AMAge-Net incorporates a Multi-Head Cross-Attention mechanism followed by a Gated Fusion Module, allowing the model to dynamically prioritize the most informative features from each modality, thereby improving interpretability and predictive accuracy. Evaluation on the Cam-CAN dataset (652 participants, aged 18-89) demonstrates that AMAge-Net outperforms state-of-the-art unimodal and multimodal baselines, achieving a mean absolute error (MAE) of 5.09, root mean square error (RMSE) of 6.52, R2 of 0.87, and Pearson correlation (PCC) of 0.94. The proposed model further demonstrates robust generalization, achieving an MAE of 4.29, RMSE of 5.59, R2 of 0.58, and PCC of 0.77 on the independent OASIS-3 dataset. Comparative and ablation studies further confirm the effectiveness of the proposed fusion strategy and modality-specific encoders. Beyond predictive performance, AMAge-Net highlights interpretable brain regions that provide insights into the mechanisms of functional and structural brain aging, while gender-specific analyses reveal distinct aging trajectories between males and females. These findings establish AMAge-Net as a powerful and interpretable approach to brain age estimation, advancing efforts to characterize healthy aging and detect early deviations associated with neurological and psychiatric disorders. Author summaryEstimating the biological age of the brain from imaging data offers a window into normal development, healthy aging, and the early stages of disease. A major challenge is how to combine information from structural scans, which show brain anatomy, and functional scans, which capture brain activity. Here, we present a new computational framework that integrates both types of data to improve the accuracy and interpretability of brain age prediction. Applied to two independent, large-scale lifespan magnetic resonance imaging datasets of individuals spanning early adulthood to late life, our framework produced highly accurate predictions and consistently outperformed existing methods. Beyond predictive performance, the model highlighted brain regions that appear especially important for age-related changes, and it revealed distinct aging patterns between men and women. These findings provide a powerful and interpretable tool for studying how the brain changes across the lifespan, with potential applications in detecting early deviations linked to neurological and psychiatric disorders.
Zhou, J.; Miller, R. J.; Shanbhag, A.; Killekar, A.; Han, D.; Patel, K. K.; Pieszko, K.; Yi, J.; Urs, M. K.; Ramirez, G.; Lemley, M.; Kavanagh, P. B.; Liang, J. X.; Kamagate, A.; Builoff, V.; Einstein, A. J.; Feher, A.; Miller, E. J.; Sinusas, A. J.; Ruddy, T. D.; Knight, S.; Le, V. T.; Mason, S.; Chareonthaitawee, P.; Wopperer, S.; Alexanderson, E.; Carvajal-Juarez, I.; Rosamond, T. L.; Slipczuk, L.; Travin, M. I.; Packard, R. R.; Acampa, W.; Al-Mallah, M.; deKemp, R. A.; Buechel, R. R.; Berman, D. S.; Dey, D.; Di Carli, M. F.; Slomka, P. J.
Show abstract
Purpose: Spatial distribution of coronary artery calcium (CAC) may provide additional prognostic value in patients undergoing SPECT and PET myocardial perfusion imaging (MPI). We aimed to automatically identify CAC in proximal segments from attenuation correction CT (CTAC) scans using artificial intelligence (AI) and to evaluate prognostic significance in two large international multicenter registries. Methods: From hybrid MPI/CT imaging (N=43,099) across 15 sites, we included 4,552 most relevant patients with 1) no prior coronary artery disease; 2) AI-derived mild CAC scores (1-99); and 3) normal perfusion (stress total perfusion deficit <5%). The independent associations between AI-identified proximal CAC and major adverse cardiovascular events (MACE) and all-cause mortality (ACM) were evaluated using multivariable Cox regression, likelihood ratio test (LRT), and continuous net reclassification index (NRI). Results: Among the patients with mild CAC and normal perfusion (mean age 65{+/-}12 years, 51% male), 1,730 (38%) had proximal CAC. Over 3.6 (inter-quartile interval 2.1, 5.2) years follow up, 599 (13%) and 444 (10%) patients had MACE or ACM, respectively. Proximal CAC was associated with an increased risk of MACE (adjusted hazard ratio [HR] 1.24, 95% CI 1.03-1.48, P=0.02) and ACM (adjusted HR 1.25, 95% CI 1.01-1.53, P=0.04) after the adjustment of CAC score and density, clinical risk factors, and perfusion deficit. Proximal CAC improved the risk stratification of MACE (LRT P=0.02; NRI 12%) and ACM (LRT P=0.04; NRI 12%). Conclusion: In patients with mild CAC and normal perfusion, AI detection of proximal CAC identified a higher-risk group for adverse outcomes, highlighting its prognostic utility.
Arab, F.; Sipes, B. S.; Nagarajan, S. S.; Raj, A.
Show abstract
Global Signal Removal (GSR) is a widely applied step in functional magnetic resonance imaging (fMRI) preprocessing. Although GSR conventionally denotes Global Signal Regression, we use Global Signal Removal to encompass a broader family of spatial filtering operations. GSR in general remains controversial due to concerns about introducing spurious anticorrelations and removing neurally meaningful signals. In this paper, we provide a precise geometric characterization by formalizing GSR as graph spatial filtering. We demonstrate that the most common form of GSR, Regression-GSR, equates to a rank-1 deflation of the covariance matrix (i.e. functional connectivity) by the degree vector. Empirically, the degree vector is dominated by the first principal component of the functional connectivity matrix (correlation = 0.88 {+/-} 0.12 in resting-state HCP data), making Regression-GSR an approximation to first eigenmode removal. This view of GSR as a spatial projection framework allows us to develop a family of GSR variants, each expressible in a unified spatial filter: Naive-GSR removes the uniform vector, PCA-GSR precisely removes the first eigenvector, and SC-GSR, a new variant we introduce that removes the first harmonic of the structural connectivity matrix. A key distinction emerges: while Naive, PCA, and SC-GSR are orthogonal projections, Regression-GSR is an oblique projection that computes regional weights proportional to the degree vector but removes a spatially uniform signal. All GSR variants induce numerical singularity in the covariance matrix, but they differ in their effects on task-state separability, which we examine empirically. In summary, we reframe GSR as a family of graph spatial filters that enable interpretability of its effects, with systematically varying effects on network connectivity across variants.
Neves, C.; Steele, C. J.; Xiao, Y.
Show abstract
Resting-state electroencephalography (rs-EEG) offers a cost effective and portable alternative to conventional neuroimaging for dementia screening, yet the lengthy, multichannel nature of rs-EEG makes learning robust representations challenging. Convolutional and Transformer based architectures dominate current deep learning based approaches, but often struggle with long-range dependencies and may not properly preserve channel-dependent features. In this work, we propose EEG-ChiMamba, a state space model based architecture designed for the classification of mild cognitive impairment (MCI) and dementia from normal controls using raw channel-independent rs-EEG signals. Our method decouples channel-wise representation learning from modeling cross-channel interactions and leverages Mamba layers for effective long-sequence modeling. We evaluate our method on the Chung-Ang University EEG dataset (CAUEEG) with 1,155 subjects, the largest public rs-EEG dataset for challenging MCI and dementia differential diagnosis. We achieve a 3-class accuracy of 57.65% using a strict subject-wise split, and relate task-specific features learned by our model as revealed by feature occlusion-based explainability techniques to clinical literature, highlighting that state space models can facilitate interpretable and scalable clinical rs-EEG screening tools for cognitive degeneration. The code for the study is publicly available at: https://github.com/HealthX-Lab/EEG-ChiMamba
Gonzalez-Castillo, J.; Caballero Gaudes, C.; Handwerker, D. A.; Bandettini, P. A.
Show abstract
Consistent, high-quality data is key to the success of fMRI studies given the many confounding factors and undesired signals that contaminate these data. Several quality assurance (QA) metrics exist for fMRI (e.g., temporal signal-to-noise ratio (TSNR), percent ghosting, motion estimates), but none of them leverage relationships between echoes that are part of multi-echo (ME) fMRI acquisitions. Here, we fill this gap by proposing a new QA metric for for ME-fMRI that quantifies the likelihood a given ME scan is dominated by BOLD (Blood Oxygenation Level-Dependent) fluctuations. We refer to this metric as pBOLD; the probability of the signal change being primarily BOLD contrast-dominated. Having an estimate of overall BOLD weighting - both before and after preprocessing - is meaningful because BOLD is the intrinsic contrast mechanism used in fMRI to infer neural activity. We introduce pBOLD to the neuroimaging community by first describing the theoretical principles supporting the metric. Next, we validate pBOLD efficacy using a small dataset (N=7 scans) of constant- and cardiac-gated scans that have distinct levels of contributing BOLD fluctuations. Third, we apply pBOLD to a larger publicly available ME dataset (N=439 scans), to evaluate six different pre-processing pipelines, and show how pBOLD provides complementary information to TSNR. Our results show that ME-based denoising increases both pBOLD and TSNR relative to basic denoising; however, including the global signal (GS) as a regressor only improves TSNR, but worsens pBOLD. Further analyses looking at the BOLD-like characteristics of the GS and its relationship to cardiac and respiratory traces suggest that the observed decrease in pBOLD is likely due to a decrease in BOLD fluctuations of neural origin contributing to the GS, and not due to contributions from other physiological BOLD fluctuations (i.e., respiratory and cardiac function). Finally, we also demonstrate how pBOLD can be applied as a data quality metric, by showing how higher pBOLD results in better ability to predict phenotypes based on whole-brain functional connectivity matrices.